Visual Salience and Reference Resolution in Situated Dialogues: A Corpus-based Evaluation
نویسندگان
چکیده
Dialogues between humans and robots are necessarily situated. Exophoric references to objects in the shared visual context are very frequent in situated dialogues, for example when a human is verbally guiding a tele-operated mobile robot. We present an approach to automatically resolving exophoric referring expressions in a situated dialogue based on the visual salience of possible referents. We evaluate the effectiveness of this approach and a range of different salience metrics using data from the SCARE corpus which we have augmented with visual information. The results of our evaluation show that our computationally lightweight approach is successful, and so promising for use in human-robot dialogue
منابع مشابه
Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue
This paper proposes an approach to reference resolution in situated dialogues by exploiting extra-linguistic information. Recently, investigations of referential behaviours involved in situations in the real world have received increasing attention by researchers (Di Eugenio et al., 2000; Byron, 2005; van Deemter, 2007; Spanger et al., 2009). In order to create an accurate reference resolution ...
متن کاملReference Resolution in Situated Dialogue with Learned Semantics
Understanding situated dialogue requires identifying referents in the environment to which the dialogue participants refer. This reference resolution problem, often in a complex environment with high ambiguity, is very challenging. We propose an approach that addresses those challenges by combining learned semantic structure of referring expressions with dialogue history into a ranking-based mo...
متن کاملWhat's There to Talk About? A Multi-Modal Model of Referring Behavior in the Presence of Shared Visual Information
This paper describes the development of a rule-based computational model that describes how a feature-based representation of shared visual information combines with linguistic cues to enable effective reference resolution. This work explores a language-only model, a visualonly model, and an integrated model of reference resolution and applies them to a corpus of transcribed task-oriented spoke...
متن کاملDynamically structuring, updating and interrelating representations of visual and linguistic discourse context
The fundamental claim of this paper is that salience—both visual and linguistic—is an important overarching semantic category structuring visually situated discourse. Based on this we argue that computer systems attempting to model the evolving context of a visually situated discourse should integrate models of visual and linguistic salience within their natural language processing (NLP) framew...
متن کاملPentoRef: A Corpus of Spoken References in Task-oriented Dialogues
PentoRef is a corpus of task-oriented dialogues collected in systematically manipulated settings. The corpus is multilingual, with English and German sections, and overall comprises more than 20000 utterances. The dialogues are fully transcribed and annotated with referring expressions mapped to objects in corresponding visual scenes, which makes the corpus a rich resource for research on spoke...
متن کامل